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CN Abstract 

We study the tail bound of the emperical covariance of multivariate normal distribution. Following the work 
of (Gittens & Tropp, 2011), we provide a tail bound with a small constant. 

o 

1 Main result 

Let : k = 1 • • • n} follow multivariate normal distribution Afd(0, C). The scatter matrix S = X^fe=i £fc£j follows 
Wishart distribution, Wd{n,C). The estimate of C is -S. The tail bound of S has a wide range of applications, 
such as, the sample estimation of random projection. We follow the work of (Gittens & Tropp, 2011) to find the tail 
bound with smaller constants. 

Notation: Let denote the ^-th largest eigenvalue of matrix X by Xe(X), the trace of X by tr(X), and the spectral 
norm of X by ||X||. 
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C$ Theorem 1. If S follows a Wishart distribution Wd{n, C), then for 9 > 0, 
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O Pr jMC - is) > ^^^ + ^jA 1 (C)|<dcxp{-0}, (2) 

ft P r{||Is-C|| > (^^+^) ||C||j <2,ex P {-0 } , (3) 

<N Pr ( \Xe{is) Xe(C)\ > ( ^/ 2fe ^ + l) + ^er] X({c)yi G{1 ... d} )< 2 rfcxp{-0}, (4) 
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where r — tr(C)/||C||, and condition numbers K£ = X\(C)/Xi{C). 

Remark 1. When d = 1 and C = 1, then r = 1, and it is exactly the upper bound of chi-square distribtuion provided 
in in (Laurent & Massart, 2000). 

Remark 2. Applying the modification in this note to Theorem 7.1 of (Gittens & Tropp, 2011), we have 



Pr X^-S) > 1 + + 2) + ^ 1 A, <n } < (<l-< + l)cx 1 >{ -0). fori = 1 ...,/. 



Pr|M^)< (l-\j 29K * {ri n r£+1+2) jA,(C)|<^exp{-g}, for£=l--d, (6) 

where r t — K(C) / Xi(C) . As r e is smaller than r, it is tighter individually. Eq. (5) and (6) are individual 

eigenvalue bounds, but Eq. (4) is the collective eigenvalue bound. When £ = 1, K\ = 1 and ri = r, then the upper 
bound of the top eigenvalue of Eq. (5) is slightly looser than that of Eq. (1). 



1 



2 Proof 

We use part of the proof of Lemma 8 in (Birge & Massart, 1998). 
Lemma 2. Let B > and a > 0. If the log-moment generating function satisfies 

cr 2 u 2 

logE cxp{uZ] < — — for all0<u< l/B, 

6 1 J ~ 2(1 - uB) ' 

then 

Pr{Z>e}<cxp{- 2 ^ +2eB } foralle>0, (7) 

and 

Pt{Z > V26a 2 + 6B} < cxp{-6»} for all > 0. (8) 
Proof. It follows Markov's inequality that 

Pi{Z > e} < inf E exp{— ue + uZ} = exp{-/i(e)}, 

u 

where h(e) := sup M ue — 2 (i-uB) ' Also, ^he supremum is achieved for 

cr 2 u a 2 u 2 B a 2 u a 2 u 

e = 



1-uB 2(1 -uB) 2 2(1 -uS) 2(1 - uB) 2 ' 
i.e. u = B _1 [l - (7(265 + cr 2 )- 1 / 2 ] < l/B. Then we prove Eq. (7), as 



e 2 



+ cr 2 + ct 2 (1 + 2eB/<7 2 y/ 2 ~ 2eB + 2a 2 ' 

Let 



Then we prove Eq. (8), as 



/^r^7 „ o- 2 u a 2 u 2 B 

\20o~ +9B= — + — — 5 = e. 

(1 - uB) 2(1 - uB) 2 

□ 

The following Theorem is Theorem 6.2 in (Tropp, 2010), except for using Lemma 2 to achieve a different formula. 

Theorem 3. If a finite sequence {Xk : k = 1 • • • n} of independent, random, self-adjoint matrices with dimension d, 
all of which satisfy the Bernstein's moment condition, i.e. 

EX£^iF- 2 £ 2 , forp>2, 

where B is a positive constant and £ 2 is a positive semi-definite matrix, then, 

u 2 

logEexp(uX fe ) < uEX k + — — £ 2 for all < u < l/B, 

2(1 — uB) 

PrjAxQ^Xfe) > Ax(^EX fe ) + ^2n9\ 1 {^ 2 ) + eB} < dexp{-0}. 

k k 

Additionally, if X k are positive semi-definite matrices, 

u 2 

logE cxp(-uX k ) ^ -uEX k + — £ 2 for all u > 0, 

Pr{A d (^X fc ) < A d (]TEX fc ) - A /20nA 1 (E 2 )} < dcxp{-9}. 
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Proof. 



logE cxp(uX k ) = log(7 + uEX k + ^J EX D 



P =2 P' 

^ u 2 (uB)p- 2 
2 



p=2 



U 2 



uEX k + — — E 2 . 

2(1 — uB) 



It follows Theorem 3.6 in (Tropp, 2010) and Lemma 2 that 



Pr{Ai(^X fc ) > Ai(^EX fe ) + e} < inf j exp(-«A 1 (^EX fe ) - ue)tr exp(^ logE exp(uX fe )) 

k k ~ \ k k 



nu 2 



<M^deM-ue+ w — ) X 1 (^)) 



< dexp(— #), 

where e = ^nOX^^) + OB. 



u 2 

logEexp(-uX fc ) < log(/ - uEX k + —EX%) 

u 2 

< -uEX k + yE 2 , 



then 



Pr{A d (Vx fe ) < X d (J2EX k ) -e}< inf \ exp(uA d (VEX fe ) - ue)tr cxp( V logE cxp(-uX fc )) 

k k \ k k 

< inf fdexp(-ue+ n ^A 1 (E 2 ))l 

u>0 2 J 

< dexp(— #), 

where e = v^flnAi^). 

Then we prove the Bernstein's moment condition for ££ T and ££ T — C. 
Lemma 4. Lei £ be random vectors from Afd(0, C). For p > 2, 

E(£<e T ) p r< ^ S p - 2 (tr (C)C + 2C 2 ), 
E(^ T - C) p r< |^ P ~ 2 E 2 , 
E(C-^ T ) P =< |^ P ~ 2 E 2 , 

wfeere E 2 = tr(C)C + C 2 and B = 2tr(C). 

Proof. Let A" = ££ T and Y, p =E(X — C) p , for p > 2. It follows Isserlis' theorem (Isserlis, 1918) that 
(EX 2 ),, = ^Etefe = [E^][]TE4 2 ] + 2^[E^][E^] = tr(C)Cij + 2(C 2 )y, 
E 2 = El 2 - C 2 = tr(C)C + C 2 . 
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Then, we calculate EX 3 and S3 to get the basic idea. 

EX 3 = tr(C) 2 C + 2tr(C 2 )C + 4tr(C)C 2 + 8C 3 r< 5tr(C)(tr(C)C + 2C 2 ) * ^B 3 - 1 (tr(C)C + 2C 2 ), 

S 3 = EX 3 - EXCX - EX 2 C - ECX 2 + EC 2 X + ECXC + EXC 2 - C 3 
= EX 3 - EXCX - 2(tr(C)C 2 + C 3 ) 

= (tr(C) 2 C + 2tr(C 2 )C + 4tr(C)C 2 + 8C 3 ) - (tr(C 2 )C + 2C 3 ) - 2(tr(C)C 2 + C 3 ) 
= tr(C) 2 C* + tr(C* 2 )C + 2tr(C*)C 2 + 4C 3 r< 4tr(C)(tr(C)C + C 2 ) < ^B 3 ' 1 ^, 
E(C - X) 3 = -S 3 < < ^B 3 - 1 E 2 . 

Let Zfc^ = Y[j Yk.ij, where Yk.ij is X or C, fc is the number of C"s in the term between and p, i is the index term 
between 1 and and j is between 1 and p. Each element of Y k<i j can be written as £,i j _ 1 S,i j or Ci j _ 1 j j , where 
is between 1 and d. It follows Isserlis' theorem that the expectation of each element EZ ki is the sum of the product 
of the expectations of all combinations. For example, in p = 3, we write Z\^ = XCX, then 

EZ li2 = (E >Ia 6o6iCji,l a 6 2 6s : io, i 3 € {1 ■ ■ ■ d}) 

= (Ei u iM^h)C h ,iM^h) +mMC h ,i 2 miAh) +mUh)c h ,iM^h)} ■ io,h e {i - ■■<*}) 

= [(01)(12)(23)] + [(02)(12)(13)] + [(03)(12)(12)], (9) 
= [(0123)] + [(0213)] + [(03)(121)] (10) 
= C* 3 + C* 3 + tr(C 2 )C (11) 

In Eq (9), each C is written a pair, and each product as a list. In Eq (10), pairs are combined into one chain and 
serveral loops. Then in Eq (11), each chain is C c , where c is the lenth of the chain, and each loop is tr(C'), where / 
is the length of the loop. In general, EZ^j is the sum of terms like C c • tr(C ij ). 

We have C c ^ tr(C) c ~ 2 C 2 ^ tr(C) c_1 (7, and tr(C') < tr(C)', so we only count the terms with singleton chain, 
i.e. c = 1, and all terms to bound the expectations with tr(C) p - 2 (tr(C)C + 2C 2 ) or tr(C) p ' 2 {tr(C)C + C 2 ). EZ Kl is 
a expectation of (2p — 2fc)-order moments, which yields (2p — 2fc — 1)!! terms. For a given k, we have (£)(2p— 2fc— 1)!! 
terms, assuming ( — 1)!! = 1. A singleton chain term must contain (0,p), thus must be ^dlj=2 Y k ,i,j)X. For a 
given fc, the number of singleton chain terms is { p ~ k 2 ){2p — 2k — 3)!!. 

For EJT P = EZ 01 has (2p — 1)!! terms, which include (2p — 3)!! singleton chain terms. The number of singleton 
chain terms is less than a third of the number of all terms when p > 2. For p > 2, 

EX P < {2P - (tTfCf^C + 2tr(C) p - 2 C 2 ) = T( * ± 1/2) »!2 p (tr(C) p - 1 C + 2tr(C) p - 2 C 2 ) 
3 3V7rr(p+l) 

r< ^!2 p (tr(C) p - 1 C + 2tr(C) p - 2 C 2 ) = ^5 p - 2 (tr(C)C + 2C 2 ). 
8 2 

Then E(X — C) p = ^2 k (—^) k X)i ^fe,i- The number of singleton chain terms is less than half of the number of all 
terms. Thus 

S 4 r< 10tr(C , ) 4 - 1 C , + 50tr(C , ) 4 - 2 C' 2 r< 30tr(C*) 4 " 1 C + 30tr(C) 4 - 2 C 2 r< ^5 4 - 2 S 2 , 
E(C-X) 4 = S 4 r< ^B 4 ~ 2 S 2 . 
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When p > 5 



S p r< EX p + C P < — — - 

_ . r(p + i/2) , 1 




(tr(C) p - 1 C + tr(C) p - 2 C 2 ) 



2^/7rr(p + 1) p!2P+! 



)p\2 p tr(C) p - 2 (tr(C)C + C 2 ) 



± 0.1232 x p!2 p tr(C) p - 2 (tr(C)C + C 2 ) ± ^B p ~ 2 £ 2 , 
E(C - A) p r< EX P + C p x ^B p - 2 Y, 2 . 



□ 



Now we prove Theorem 1. 

Proof of Theorem 1. Let X k = £ k £j - C. We have EX k = 0, Ai(S 2 ) < (r + l)Ai(C*) 2 , and B = 2r\ 1 (C). Then 
Eq (1) follows Lemma 4 and Theorem 3. Similarly, letting X k — C — £fc£/Tj we prove Eq (2). Combining them and 
||C|| = Ai(C), we have Eq (3). Plugging Ai(C) = KpXt(C), Eq (4) follows Weyl's theorem on eigenvalues, specifically, 
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Xe(-S)<X e (C) + X 1 (-S-C), 
n n 

X e (C)<X e (-S) + X 1 (C--S). 



□ 
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